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Abstract — Fitts law is a fundamental tool in measuring the 
capacity of the human motor system. However, it is, by defi- 
nition, limited to aimed movements toward spatially expanded 
targets. We revisit its information-theoretic basis with the goal 
of generalizing it into unconstrained trained movement such as 
dance and sports. The proposed new measure is based on a 
subjects ability to accurately reproduce a complex movement 
pattern. We demonstrate our framework using motion-capture 
data from professional dance performances. 

Index Terms — Fitts' law, information capacity, human motor 
system, human-computer interaction 

I. Introduction 

The purpose of the human motor system is to transform 
electro-chemical signals in the nervous system into physical 
movement. The dominant paradigm for studying the infor- 
mation capacity of the human motor system is based on the 
pioneering work by Paul Fitts in the 1950s (6), (7), (16). Its 
primary application is the analysis of user interfaces in human- 
computer interaction pO| , p3| , (17) ; it was, for instance, one 
of the main drivers in the development and adoption of the 
computer mouse (3). 

Fitts was interested in aimed movements; i.e., movement 
where a pointer (finger, eye fixation, arm, mouse cursor etc.) 
is moved on top of a spatially expanded target. A common 
example is moving mouse cursor on top of a button on a 
computer display. Fitts' law describes the observation that 
the relationship between movement time MT and spatial 
characteristics of the required movement is characterized as: 

MT = a + b\og 2 + , (1) 

where D is the distance from the starting point to the center 
of the target and W is the width of the target; a and b are 
empirical parameters determined by the task, the pointing 
device, and the performer. MT is typically measured in an 
empirical procedure involving rapid responses to spatial targets 
with experimenter-controlled characteristics. 

The information-theoretic basis of Fitts' law is centered 
around the tradeoff between the speed and accuracy of move- 
ments produced by the motor system. In information-theoretic 
terms, the capacity of the motor system as a channel of 
communication is limited by this tradeoff. Since physical 



movements are naturally measured on a continuous (spatial) 
scale, the measurement of their information content must 
involve the determination of their accuracy or, as Fitts points 
out, 

" [s]ince measurable aspects of motor responses, 
such as their force, direction, and amplitude, are 
continuous variables, their information capacity is 
limited only by the amount of statistical variability, 
or noise, that is characteristic of repeated efforts to 
produce the same response. " /|6]? 

The information theoretic interpretation of Fitts' law J6), 
Q' @' GiD' GZ)' wnere information throughtput is formal- 
ized in terms of a standard Gaussian channel, see [4], has been 
immensely popular since it enables the comparison of per- 
formance across situations with different characteristics. The 
index of performance (IP) defines the information throughput 
in units of bits per second (bps): 

IP = 1/b. (2) 

IP is argued to be a good metric because, as observed by Fitts, 
and later many others, it tends to stay relatively constant over 
a broad range of values of D and W (15) , (T7J, providing a 
natural basis for comparison of pointing devices. The mouse, 
for example, typically reaches ca. 4 bps, and joystick ca. 2 
bps (15). 

The motivation for the present work is that important aspects 
of the information potential of human motor system are not 
covered by the Fitts' law paradigm, and that consequently, 
the capacity of human motor system is systematically un- 
derestimated — insofar as the said paradigm even attempts 
to estimate the capacity of the whole motor system. In fact, 
Fitts' law and its generalizations are constrained to aimed 
movements of one (or few) body part(s) in target conditions 
that are prescribed to a high degree by the experimenter. This 
has three important implications. Firstly, the "information" 
that is being measured is tantamount to the subject's ability 
to motorically conform to extrinsic constraints, excluding 
entirely free movement, i.e., movement produced irrespective 
of its absolute position in respect to perceivable environ- 
mental constraints. Such movements are important in many 
skilled activities, such as dance and sports. The issue of 



underestimation is exacerbated by the empirical paradigm, 
which utilizes very simple repetitive movements with simple 
trajectories (see |[TJ. Secondly, Fitts' law does not account for 
information in simultaneous movement of multiple body parts 
(for an exception, see fl2|). There are 640 muscles, 200-300 
joints, and 206 bones in the human body. Obviously we are 
not able to independently control each one of them, but some 
separation is possible; for instance, the thumb and the index 
finger can be moved relatively independently of each other 
and the three other fingers [8]. Thirdly, most skilled activities 
involve compound tasks, with multiple aimed and other types 
of movement performed simultaneously and sequentially. Due 
to these three limitations, we argue that the Fitts' law paradigm 
is not suitable for the study of skilled motor action; i.e., 
precisely the ones that can be expected to contain the most 
information! 

Extending Fitts' definition, we define information capacity 
in terms of the ability to accurately reproduce any previously 
performed movement pattern. An infant is a good example of 
low information capacity. At any moment in time, the infant's 
movement can appear complex, but the fact that he or she 
cannot reproduce it at will means that the motor system lacks 
the information capacity to do so. 

Our formulation is based on subjects performing arbitrarily 
complex un-prescribed movements; Fitts' paradigm, involving 
only experimenter-defined pointing tasks, is a special case. 
The formulation can accommodate movement of any duration 
and composition and involving contributions of any part of the 
body. 

The rest of the paper is organized as follows. In Sec. [H] 
we describe a measure of shared information between two 
movement sequences. The data and the preprocessing steps 
are detailed in Sec. [Hi] and the results of the experiments are 
summarized in Sec. |IV| To conclude, in Sec. [V] we discuss 
potential applications and outline future work. 



II. Information Measure 

To quantify the information capacity, it is necessary to 
separate the controlled aspects of the performed sequence of 
movements from the unintentional aspects that are unavoid- 
ably present in all motor responses. As discussed above, the 
strictly defined range of admissible performances in Fitts' 
paradigm has a similar function: it rules out apparently com- 
plex, uncontrolled (random) sequences of movements. Instead 
of restricting the allowed movements, we propose to solve 
this task by having a sequence repeated as exactly as possible 
by the same subject. This makes it possible to obtain an 
estimate of the variability of the two patterns, and subtract 
the complexity (entropy) due to it from the total complexity 
of the repeated performance. In other words, information is 
measured by two aspects of the performance: i) the complexity 
of a movement pattern, and it) the precision with which it can 
be repeated. To clarify, we let the complexity of a sequence 



be given by its entropj^J 

A. The One-Dimensional Case 

For simplicity, we start by treating the one-dimensional case 
where each movement sequence is characterized by a single 
measurement per time frame. Let x = X-i, . . . ,x n denote a 
sequence where xt gives the value of the measured feature 
at time t € {— We start the sequence from X-\ 
instead of X\ for notational convenience: the first two entries 
guarantee that an autoregressive model with a look-back (lag) 
of two steps can be fitted to exactly n data points. Similarly, 
we denote by y = y_ i , . . . , y n another movement sequence 
of the same length as x. 

We assume that both x and y follow a second-order 
autoregressive model 

x t = po + pixt-i + P2X t -2 + e* , (3) 

vt = vo + vivt-i + mvt-2 + e t y \ ( 4 ) 

where /3o,/3i,/32 and 770, /71 , J72 are real-valued parameters to 
be tuned using least squares. The second-order model accounts 
for the basic physical principle that once the movement 
vector (including direction and velocity) is specified, constant 
movement contains no information whatsoever. 

The errors (or innovations) e[ x ^ and are assumed to be 
zero mean Gaussian random variates. Since the two sequences 
are supposed to be repetitions of the same movement pattern, 
we let e^ x) and e{ y > be correlated with some correlation 
coefficient p 6 (—1, 1). The innovations for different time 
frames t ^ t' are assumed to be independent of each other. 

Having fitted the parameters to observed sequences, we 
obtain the residuals 

r ( (x) =x t -Xt=x t - 0o + PiXt-i + A^t-a), (5) 

^ (y) =Vt-yt = Vt- (Co + Cm-i + Ca2/i-a), (6) 

where x t and y t denote the predicted values based on the least 
squares estimates /3o,/3i,/?2 an d ?7o^i>7?2> respectively. 

Under the model (j4j), the (differential) entropy of each of the 
sequences can be estimated by plugging the residual variance 
into the familiar formula for the Gaussian entropy (see Q): 

h(x) « I log 2 (27re^), h(y) « ~ log 2 (27re<7 y ), (7) 

where ari — 53™ =1 (rj ') 2 / ' n i s tne residual variance of x and 
<jy is defined similarly. 

The mutual information between the movement sequences, 
which gives the reduction in bits in the entropy of one 
sequence when we are given the other, is now fully determined 
by the residuals, and in particular, their correlation p: 

7(x;y) = -^log 2 (l-p 2 ). (8) 

However, since we do not in general know the true correlation 
coefficient, we need to estimate it from the data. Using the 

'In the case of continuous signals, we continue to do so, keeping in mind the 
caveats associated with the interpretation of differential entropy, see, e.g. |4] 
Chapter 8]. 



empirical correlation coefficient tends to underestimate the true 
value, and hence, our mutual information estimate based on 
it will tend to be too high. (For instance, even if the true 
correlation is zero, we will always get an estimate that is 
greater than zero.) There are various ways to compensate for 
this bias. We adopt an approach similar to Rissanen's classic 
two-part approximation to the stochastic complexity fl3) , 
whereupon the estimated mutual information becomes 

\ 1 



TABLE I 

Summary of the data used in the experiments. 



/(x ; y) 



log 2 (1 



log 2 n, 



(9) 



2 OZK r > 2 
where the last term will act to overcome the overestimation of 
the mutual information due to fitting the correlation parameter 
to a finite amount of data (see, e.g., |5) for many interesting 
properties of the stochastic complexity formula; those familiar 
with the concept, may notice that our penalty term is equal to 
| log 2 n with k = 1 parameters). 

The mutual information has a direct interpretation in terms 
of the reduction in bits required to encode the sequence x due 
to the side information y being available. Since the mutual 
information in x and y excludes, with high probability, most 
of the uncontrolled movements and inaccuracies which tend 
not to be repeated when the movement is performed twice, we 
argue that it provides a measure of the controlled information 
in x. To achieve high mutual information, a movement has to 
be both complex and accurately controlled so that it can be 
repeated with high precision. 

Finally, we define the observed throughput in a sequence x 
conditioned on sequence y as the estimated mutual informa- 
tion per second: 



TP(x | y) 



iU(x;y) R 



log 2 (1 - p 2 ) 



R 



n 2 oz v r ' 2n 

where R denotes the frame rate (frames per second). 
B. The Multidimensional Case 



log 2 n, 
(10) 



When handling p-dimensional sequences, p > 1, where 
each time frame x t is composed of p measured components 
(features), xt = (x\ , . . . ), it is not sufficient to simply 
sum up the information throughput in each of the components 
separately. This would namely exaggerate the throughput as 
redundant information that is contained in more than one 
component was counted several times. 

To reduce the effect of redundant information shared be- 
tween features, we decorrelate the features. To this end, we 
first perform principal component analysis (PCA) on move- 
ment sequence x. We then transform both sequences to obtain 
two new time series, x' and y' where each frame in each 
sequence is obtained by a linear transformation (the same one 
for both x and y) of the corresponding frame in the original 
sequence. Typically most of the variance in the new sequences 
is focused on a fraction of the principal components, and we 
retain only as many as are required to cover 90 percent of 
the variance (of x). The newly obtained lower-dimensional 
sequences are then analysed using the technique described 
above, and the throughputs are summed up. 



# Label 



1 adagio (temps Iti, arabesque, pas de bourree, balance - ) 

2 — — 



10 



tombe pas de bourree, Italian fouette, pique turn, jete en 
tournant 

— — 

petit jete (glissade jete, ballotte, ballon, entrechat, as- 
semble) 

— ii — 

grand jete (battement developpe, chasse, grande jete 
developpe, arabesque, fouette saute, jete en tournant) 



petit jete ( tendu croise, sissonne devant fermee, derriere 
fermee, sissonne ouvert pas de bourree) 



4254 
4459 

4001 

3724 
1535 

1574 
1560 

1621 
1091 

1114 



III. Data and Preprocessing 

In order to study unconstrained performances without limit- 
ing ourselves to specific tasks or parts of the body, we analyse 
motion capture data. Motion capture data is typically obtained 
by recording a subject by a set of cameras, and using special- 
purpose image processing technologies to convert the recorded 
video into variables such as 3D coordinates or angles of joints 
(wrists, elbows, shoulders, waist, knees, etc). 

For out experiment, we recorded the performance of a 
professional dancer performing movement sequences of her 
own choice. The recording and motion capture analysis was 
performed at the Perception, Action and Cognition Lab, Uni- 
versity of Glasgow, see Table [j] and Fig.[T] The sequences are 
recorded at frame rate 120 per second. For each frame, the 
data contains p = 111 features, corresponding to the three- 
dimensional coordinates of 37 markers attached to different 
parts of the body. 

The inherent problem in predicting one motion sequence by 
another is the possible misalignment of the sequences in time. 
Usually, even very carefully repeated movements are slightly 
out of synchronization, and hence when predicting the i'th 
frame of sequence x, the most useful frame of sequence y 
may not be the t'th frame but the t + <5'th one with <5 7^ 0. 
Therefore, it is necessary to align the two sequences to obtain 
a better synchronization. 

We aligned each pair of sequences in the data set by 
applying Canonical Time Warping (CTW^J [18), a state-of- 
the-art technique for aligning sequences describing human be- 
havior. CTW uses the more traditional Dynamic Time Warping 
(DTW) fTT) as an initial solution but improves it by adopting 
features from Canonical Correlation Analysis (CCA) (see pi). 
This allows alignment based on a more flexible concept of 
similarity than usually used in DTW. 

The result of a pairwise alignment of two sequences, with 
possibly different lengths, is a new pair of aligned sequences 
whose lengths are equal, such that each frame in one sequence 



2 Matlab code is available at 
ctwCode.html 



www.humansensing.cs.cmu.edu/projects/ 





Fig. 1. Data collection procedure. LEFT: An example of a motion capture situation on video. RIGHT: A visualization of the captured pose. 



matches as well as possible with the same movement (similar 
measured features) in the other. To achieve this, the CTW 
algorithm duplicates some of the frames in each sequence so 
as to "slow down" the sequence in question at suitable points; 
see the example in Fig. [TJ When measuring the throughput, 
we skip the duplicated frames in sequence x in order to avoid 
unnecessarily magnifying their impact. Hence, if frame t is 
duplicated in sequence x so that in the aligned sequence, 
x', frames t and t + 1 are identical, we skip the t + l'th 
frame (of both x' and y') when evaluating the throughput, 
Eq. ( (TO) . The sequences were also normalized so that each 
feature has mean zero and unit variance. It is important to 
also note that we compute the residuals of both sequences 
from the unaligned sequences where there are no duplicate 
frames. However, the alignment is done based on the actual 
sequences (not the residuals). 

As an undesirable consequence of the use of alignment 
methods in preprocessing the motion capture data, we lose the 
information about the temporal accuracy of the movements. 
Clearly, a significant amount of controlled information are 
required for timing the motor responses. Working with aligned 
sequences, there is no way to measure the accuracy to which 
the repeated performance is synchronized with the original 
performance. One possibility is to examine the alignment itself 
to see how much information is required to bring the two 
sequences in close agreement, and to add this information 
to the information content due to spatial accuracy. We will 
explore this issue in further work. 

IV. Results and Discussion 

Table [IT] lists all the throughput values for each pair of 
movement sequences corresponding to the same movement 
pattern, see Table |TJ Of all the pairwise throughput values, 
TP(x | y), the highest one, 1653 bits per second (bps), 
is obtained for sequence 8 conditioned on sequence 7, see 
Fig. [TJ Their similarity is easily confirmed visually from the 



video recordings and the animated reconstructions available 
(not shown). The values are nearly symmetric: the throughput 
in sequence 7 conditioned on sequence 8 is 1580 bps. The 
lowest throughput, 640 bps, was observed for sequence 1 
conditioned on sequence 2. 

As a sanity check, we also evaluated the throughput for pairs 
of sequences that were not repetitions of the same movement 
pattern. As expected, the obtained throughput values are all 
very small or even negativ^] 



3 Negative values are possible due to the second term, ~ log 2 n, in Eq. {9}. 
In terms of the Minimum Description Length (MDL) Principle [5], [13], this 
would be taken to indicate that a model where x and y are independent is 
superior to the model where they are correlated via the innovation sequences. 
Note that this is equivalent to model selection using the Bayesian Information 
Criterion (BIC) fl5) . 



Sequence 7 




Frames 
Sequence 8 




Fig. 2. The plotted sequences of two motion capture sequences (sequences 
7 and 8, see Table [TJ after alignment — note the high similarity of the two 
sequences. 



TABLE II 

Measured throughput values for the sequences listed in 
TableQ] 



X 


y 


TP(x | y) 


1 


2 


640 bps 


2 


1 


668 bps 


3 


4 


1408 bps 


4 


3 


1481 bps 


5 


6 


931 bps 


6 


5 


914 bps 


7 


8 


1580 bps 


8 


7 


1653 bps 


11 


12 


763 bps 


12 


11 


756 bps 



V. Conclusions and Future Work 

The experiment we have described demonstrates the 
main idea in our framework, i.e., extending the prevailing 
information-theoretic framework to allow completely uncon- 
strained movements, and thereby, to determine the maximum 
of the achievable information capacity. Motion capture data 
provides the best way to characterize such movements in a 
way that does not rule out any potentially informative aspects 
in them. 

That said, it will be interesting to compare the capacity 
estimates obtained by other methods, such as pointing devices 
(the traditional tool in Fitts' paradigm), data gloves, etc., and 
to see if the earlier results are replicated. For instance, it 
is interesting to see if more information can be extracted 
from Fitts' original reciprocal pointing task by recording the 
movements by a data glove or motion capture: the question is 
whether the path along which the hand operating the pointer 
moves between the two targets carries additional information 
beyond the information provided by the end-points, and if it 
does, how much. 

Achieving the goal of constructing a complete and reliable 
measure of information capacity will lead to a wealth of useful 
knowledge about the human motor system. Concrete utility is 
to be seen, for instance, in the study of novel human-computer 
interfaces that involve free whole-body expression. Possible 
applications in sports science include training of complex mo- 
tor schemas with reference models. Potential new diagnostic 
tools based on monitoring changes in the information capacity 
of the motor system may offer great societal value through 
early identification of neurological disorders related to motor 
dysfunction and in monitoring recovery of neuroplasticity after 
lesions. We will explore these lines of research in further work. 
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